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I have been developing a strategy for using instance 
variables that you might find helpful.Thisstrategy pro¬ 
vides guidance for several common programming 
tasks, such as properly initializing instance variables and 
providing accessors to use them. It shows how to imple¬ 
ment equality methods and helps guide the initial deci¬ 
sions in making an object persistent. Finally, it explains 
why the instance variables in various application layers 
tend to behave differently. 

Although this strategy probably isn't perfect, it is one 
that I find useful. The strategy doesn’t consist of hard and 
fast rules you should always obey, just suggestions you 
should consider and trends you can look for. I can’t guar¬ 
antee that following these guidelines will make you abet¬ 
ter programmer, but they should help. 

TYPES OF INSTANCE VARIABLES 

I’ve noticed that not all instance variables are created 
equal. Some seem to be more important than others 
When using instances of a particular class, I notice that 
I'm constantly inspecting certain instance variables to 
make sure their values look reasonable, yet I consistent¬ 
ly ignore other instance variables. So I’ve been trying to 
figure out how to distinguish the important ones from 
the unimportant ones. 

In looking at how I use instance variables, I’ve found 
that there are three types, which I call identity, status, 
and cache. When looking at a new class, I try to distin¬ 
guish these types to help figure out how the class works. 
When one of my own classes doesn’t work well, I look at 
how I'm using these types; often I find inconsistencies; 
when I clean those up, the class works better. As I help 
other people develop their classes, I look for these types. 
If possible, I encourage the developers to identify each 
instance variable’s type and use it "correctly." 

I describe the three types in the following subsections. 

Identity variables 

Identity variables are how you distinguish two instances 


of a class. If both objects have the same identity values, 
they represent the same entity. Once an identity value is 
set, it usually doesn't change. After all, if you recognize an 
object because it has a certain identifier, and that identi¬ 
fier changes, how will you recognize it again next time? An 
object’s identity values must be set for the objects’ state to 
be valid. Also, there are usually no good default values for 
identity variables. Multi pie objects with the same default 
values would be indistinguishable. Examples of typical 
identity variables include uniquelD, name, and a tree 
node’s parent. 

Status variables 

When developers talk about instance variables—the vari¬ 
ables that maintain an object’s state and are accessed 
through getter and setter methods—they’re usually talking 
about what I call "status variables." Statusvariables main¬ 
tain an object’s internal state and its relationships to other 
objects. These relationships may be aggregate or asso¬ 
ciative. Whereas identity values don't change, status val¬ 
ues change constantly to reflect the object’s changing 
state. Like identity variables, status variables must be set 
in order for the object’s state to be valid; otherwise its 
internal state is undefined and inconsistent. If a status 
value is lost (set to an invalid value such as nil), the 
o bj ect's state can n ot be recovered. Finally, status variables 
have suitable default values. (If nothing else, nil can be 
used as the default value, but that’s often not a very good 
one. See my previous discussion on the Null Object pat¬ 
tern. 1 ) Taken together, these default values describe the 
object’s initial state. Examples of status variables include 
address, employer, and a tree node’s children, as well as 
the various settings represented on a GUI using check 
boxes, radio buttons, etc. 

Cache variables 

Cache variables cache the results of expensive calcula¬ 
tions. Their values are derived from the values of iden¬ 
tity and status variables. When those values change, the 
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cache values must be recalculated. So cache values 
change as frequently as the values they are based on 
change. Cache values are optional; the object’s state is 
still valid without them. If a cache value is lost, it can 
easily be recalculated. A cache variable’s default value is 
usually uncalculated, a flag indicating that the value 
hasn’t been calculated yet. The most common flag for 
uncalculated is nil, but there can be other such flags. 
For an example of a cache variable in VisualWorks, see 
CompositePart»preferredBounds. A composite calculates 
its preferred bounds by merging those of its compo¬ 
nents; it caches the result for efficiency. 

RAMIFICATIONS 

These definitions are comforting, but they alone don't 
make your code any better. Yet you can improve your 
code by recognizing these types and writing your code 
accordingly. 

Initialization 

There are three approaches to initialize a variable: 

1. Let a collaborator set its value explicitly. 

2. Set its value to a default constant. 

3. Set itsvalueto theresuItofacalculation. 

Each of these approaches is used to initialize a differ¬ 
ent type of instance variable: 

1. Identity initialization— Initializes the identity vari¬ 
ables. 

2. Creation initialization— Initializesthestatusvariables. 

3. Lazy initialization— Initial izes the cache variables. 
Identity variables are initialized by the collaborator which 
creates the object. The collaborator should accomplish 
this via an instance creation method on the class side. 
Two examples of instance creation methods in Vis¬ 
ualWorks—besides the standard ones like new, basicNew, 
and new:—are Point class»x:y: and Dependent Partclass» 
model:. An instance creation method on the class side 
should be implemented via a corresponding identity ini¬ 
tialization method on the instance side. For example, 
Point class»x:y: uses the identity initialization method 
Point»setX: setY: to create the new instance: 

Point class»x: xl nteger y: yl nteger 

''self basicNew setX: xl nteger setY: yl nteger 

Point»setX: xPoint setY: yPoint 
x : = xPoint. 
y : = yPoint 

The instance creation methods in Circle and Interval are 
implemented the same way. I prefer to name this identity 
initialization method init..., so the name I would have 
used for Point»setX:setY: would have been initX:y:. I put 
these methods in the "initialize-releasd' protocol. 

Status variables should be initialized to their default 
values when the new instance is created. The standard 
name for the method that performs creation initializa¬ 
tion is "initialize”. VisualWorks has tons of examples of 


this, such as SortedCollection»initialize. Another ex¬ 
ample is OrderedCollection»setl ndices; it isn’t called "ini¬ 
tialize” but it should be because it serves the same 
purpose. 

Cache variables do not need to be initialized until 
they are used. In fact, initializing them is usually expen¬ 
sive and should be avoided until you know the values are 
needed. The easiest way to do this is to build lazy initial¬ 
ization into their accessors VisualWorks doesn’t use this 
technique much, but two examples are Composite- 
Part»preferredBounds and SliderView»marker. You might 
implement Circle with radius as an identity variable and 
diameter and area as cache variables: 

Circle»radius 

''radius 

Circle»di ameter 

diameter isNil ifTrue: [self computeProperties]. 
''diameter 

Circle»area 

area isNil ifTrue: [self computeProperties]. 

''area 

Circle>xomputeProperties 

I r I 

diameter := radius * 2. 
r: = self radius asLimitedPrecisionReal. 
area : = r class pi * r* r 

Developers often use lazy i nitial ization with variables that 
are not caches, but I avoid this Although caches are ex¬ 
pensive to initialize, other variables usually aren’t, so I see 
no compelling advantage in using lazy initialization on 
those other variables. 

Often status variables are initialized in terms of iden¬ 
tity variables, which means that an identity initializa¬ 
tion method (in the form of initA:b:...z:) has to be run 
before the creation initialization method. Here’s a hypo¬ 
thetical example of an instance creation method that 
will do this: 

Example class»x: newX y: newY 

''(self basicNew initX: newX y: newY) initialize 

HelpBrowser class»on: is implemented this way because 
HelpBrowser»initialize ends-up using the value of on:’s 
parameter. 

Accessing 

Developers often automatically create getter and setter 
methodsforall oftheir instance variables and putthem in 
a public protocol like "accessing.” I prefer to be a little 
more selective and only create accessors for certai n types 
of instance variables. 

Identity variables need getters but no setters. Thegetters 
may be public or private. Setters are usually not necessary 
because the identity variables’ values typically don’t 
change. The only "setter” that is required is the identity 
initialization method (initA:b:...z:). Any setters you do 
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provide should definitely be private. Status variables use equal. Changes in their status don't affect their equal- 
getters and setters in the conventional manner. These ness.Thusifoneobjectisaduplicateof another, itwill be 
methods can be public or private. so through its entire lifetime, which is how it should be. 

Cachevariableshavegettersbutnosetters.Thegetters, Just as implementors of equal (=) use identity variables, 

which can be public or private, contain lazy initialization. I so do implementors of hash. If two objects are equal, their 

prefer to implement the lazy initialization via a compute... hash values need to be the same. So the same variables 

method, as shown earlier in which are used for determining 

Circle>xomputeProperties. If the cal- equality are also used for calculating 

culations for one cache variable cal- hash values. 

culateothersintheprocess,groupthe / see HO Compelling 

initialization for . . Persistence 

all of those variables together in one advantage in using When an object needs to store itself 

compute... method. Don't implement l^7\/initiali7^tinn persistently, it shouldn't necessarily 

setters; they could be used to set the laZy iniLiailzauon. store g || of jts j nstance variable val- 


caches to values that are i nconsistent 

with the object’s 

state Instead of setters, I implement 

flush... methods 

which reset the variables back to their uncalculated state 

(usual ly n i I). If one change i nval idates a number of caches, 

I flush them all in one method. 

For example, let’s say that the Circle described earlier 
caches both diameter and area and that radius can 
change. Some more of the code would be 

Circle»radius: newRadius 
radius := newRadius. 
self flushProperties 

Ci rcle»flush Properties 
diameter := nil. 
area : = nil 

The compute... and flush... methods are private ones. The 
cache getter methods with the lazy initialization send the 
compute... methods(seeCircle»diameter).Thesettermeth¬ 
ods for the status (and identity) variables send the flush... 
methods (like Circle»radius:). A particular setter does not 
need to flush all 

of the object’s cache variables, onl y the ones that were cal- 
culatedfromit. 

Equality versus identity 

In my previous article, I talked about the difference 
between object identity and object equality. Object iden¬ 
tity is very clear cut. If two variables contain identical 
objects, they are double-equal, which means that they 
both point to the same address in memory. Thus the two 
variables actually contain the same object. 

Object equality is not so straightforward. If two vari¬ 
ables’ values are equal but not identical, they contain sep¬ 
arate objects that are equivalent. The question is: What 
makes objects equivalent? In theory, they represent the 
same value. In practice, for Smalltalk, it means that a Set 
considers them to be duplicates. 

I contend thattwo objects areduplicates if their identity 
variables are equal; their status and cache values are ir¬ 
relevant. Because identity val ues rarely/ never change, this 
meansthattwoobjectsthataresometi mesequal are always 


ues the same way. Some instance 
variable types are persistent, others 
are not. 

When storingan object in a relational database, itsiden- 
tity values belong in thedatabase table’s key columns. Just 
as identity variables should uniquely identify an object, a 
row’s key column values should beuniquefrom other rows. 
Status variables that represent state have simple values 
that are stored di rectly i n table columns. Those mai ntai n- 
ing relationships to other objects becomedatabasejoins. 
There isgeneral ly no need to store cachevalues persistent¬ 
ly. Rather than consume database space, just recalculate 
them after read i ng the object out of thedatabase. 

The storage issues for an object database are si mi lar to 
those of a relational one. An object’s identity values serve 
as its keys for retrieving it from the database. Status val¬ 
ues are simply stored with the object. And cache values 
do not need to be stored at all, although they can be for 
completeness. 

Database proxies also make use of instance variable 
types. A proxy must contain the identity values for its real 
object.That way it will beableto load the real object out of 
the database. Because a proxy is supposed to be light¬ 
weight, it shouldn't contain status or cache variables. 
Ideally, as much of the proxy’s behavior as possible will be 
implemented just using the identity values. This will help 
maximize the amount of work the proxy can perform and 
minimize the number of real objects that need to be read 
from the database. 

Dictionaries, Smalltalk objects that act somewhat 
like simple databases, also make use of instance variable 
types. Each element is stored in a Dictionary by a key that 
must be unique. That key is often an identity variable. 
That variable’s value must not change while the element 
is stored in the Dictionary. Thus an identity variable makes 
a much better key than a status variable does. 

Application layering 

A Smalltalk program contains four main layers: view, 
application (mediator), domain, and infrastructure. 2 
Most of the variables in application models and view 
objects are status variables. Identity variables areconcen- 
trated in domain objects. Infrastructure objects tend not 
to contain much state at all; they mostly point to domain 
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objects in some way (which can be an identity or status 
relationship). 

Exceptions 

These guidelines are not rules that are engraved in stone. 
Identity values can changeduring an object’s lifetime. It’s 
sometimes helpful for an instance creation method to ini- 
ti al i ze some status vari abl es. A p roxy may want to contai n 
certain status values because they’re used so often. How¬ 
ever, I trytostickto these guidelines when possible. When 
I make an exception, I like to haveagood reason. 

Here are some interesting excep¬ 
tions to these guidelines that I’ve 
found in Visual Works. 

Set’s tally variable—Its behavior is 
a cache. If its value were ever lost, it 
could easily be recalculated. 

However, it's implemented as a sta¬ 
tus variable. That is because its value 
only changes by ±1 each time, a si m- 
pleand well-defined transformation on the old value. For 
a large Set, it is much easier to add or subtract 1 than to 
flush the value and recalculate it from scratch. 

Model’s dependents variable—It's behavior is a typical 
status vari able. However, when storingaModel persistently, 
thisvariablemustbetreated specially. Dependents areusu- 
ally transient and thusare not stored when thei r parent is 

Point’sx and y variables—Are these identity variables or 
status?Oncea Point i screated, can i tsx and y val ueschange? 
Generally, changing thei rvalues is a bad idea, butthereare 
plenty of examples where it works j ust fi ne. The same goes 
fortheinstance variables in Rectangle, Circle, Date, etc. 

OBJECTIONS 

As I discuss these ideas with other developers, I hear cer¬ 
tain objections repeatedly. Here are some of them and my 
replies: 

"Initializeisexpensive"—Not if it's used properly. I use 
it to initialize status variables, ones which have readily 
available default values. If an implementor of initialize is 
expensive, it's probably doing more than just initializa¬ 
tion. Which leadsto... 

"This status variable is expensive to initialize”—Then 
it’s a cache variable. Cache variables requi recalculation to 
initialize; that’s why they’re lazy initialized. Status vari¬ 
ables are initialized with simple default values that need 
no calculation. 

"This status variable is hardly ever used”—Then get it 
outofthatobjectlEverytimeyou instantiatean instanceof 
that class, you’re sucking up memory for variables that 
probablywon’tbeused. If there area number of these vari¬ 
ables, you’re wasting a lot of memory. Refactor the class 
into two or moreclassesthatseparatethevariables that are 
usually used from those that usually aren’t. By the way, 
each ofthepointersto theseoptional separateobjectsisa 
statusvariable, but it can be implemented as a cache. 

"Lazy initialization is more efficient”— Not for identity 
and status variables. They’re only initialized once. Why 


have the getters check every ti me to make su re they’re ini¬ 
tial ized?They already have been. Lazy initialization isfine 
for cache variables because they get flushed periodically 
But for identity and status variables, you alwaysusethem, 
so initialize them once and get it over with. 

A WELL-DESIGNED OBJ ECT 

Let’s take a look at how you would use these guidelines to 
design aclass. First of all, we assume that theclass’ imple¬ 
mentation requires a number of instance variables. 
•Some of thei rvalues are computed from the values of 
others. These are cache variables. 

• Some are required as part of the 
object’s state and have suitable 
default values. These are status 
variables. 

• Someothers are also required 
but do not have good default values. 
The object’s col laborators must set 
these val ues when they create the 

object.These are identity variables. 

Once you’ve established these designations for your vari¬ 
ables, follow the other guidelines to help implement the 
class properly. The identity values should not change. 
They should be used in implementors of equals and hash 
and as database keys. The cache variables should have 
lazygetters as well as flush and compute methods The sta¬ 
tus variables should be used to maintain the object’s cur¬ 
rent state. 

CONCLUSIONS 

Here are the main points in this article: 

• There are three types of i nstance vari ables: i dentity, 
status, and cache. 

• Identity values don't change, status do, and cache are 
calculated from identity and status. 

• Each type is initialized differently: identity initializa¬ 
tion from collaborators, creation initialization, and 
lazy initialization. 

• Identity variables are used for =, hash, and as diction¬ 
ary and database keys. 

• Status variables store an object’s state and relation¬ 
ships to other o bj ects. 

•Cache variables require flush and compute methods 

• These are guidel i nes only; there are exceptions 

In my next article, I'll talk about how to display an object 
as a String. It turns out that identity variables are very 
helpful for doing this. §8 
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